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Efficient algorithms for the discovery of optimal control designs for coherent control of quantum 
processes are of fundamental importance. One important class of algorithms are sequential update 
algorithms generally attributed to Krotov. Although widely and often successfully used, the associ- 
ated theory is often involved and leaves many crucial questions unanswered, from the monotonicity 
and convergence of the algorithm to discretization effects, leading to the introduction of ad-hoc 
penalty terms and suboptimal update schemes detrimental to the performance of the algorithm. 
We present a general framework for sequential update algorithms including specific prescriptions 
for efficient update rules with inexpensive dynamic search length control, taking into account dis- 
cretization effects and eliminating the need for ad-hoc penalty terms. The latter, while necessary to 
regularize the problem in the limit of infinite time resolution, i.e., the continuum limit, are shown to 
be undesirable and unnecessary in the practically relevant case of finite time resolution. Numerical 
examples show that the ideas underlying many of these results extend even beyond what can be 
rigorously proved. 
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I. INTRODUCTION 

Quantum mechanics has evolved from a fundamental 
scientific theory to the point where engineering quan- 
tum processes is becoming a realistic possibility with 
many promising apphcations ranging from control of 
multi-photon excitations T, Y and vibrational and ro- 
tational states of molecules ^3n5j to entanglement gener- 
ation in spin chains [6l [7] and gate or process engineering 
problems in quantum information fSHllj: from ultracold 
gasses JJ_ and Bose- Einstein condensates p!5HT5] , to cold 
atoms in optical lattices (TBI [T7], trapped ions [THH2^ . 
quantum dots [27] and rings f28], to Josephson junctions 
[inilSn] and superconducting qubits [3T1I32]; from nuclear 
magnetic resonance [33] to attosecond processes [34, iSSj . 
The key to unlocking the potential of quantum systems 
is coherent control of the dynamics — and in particular 
optimal control design. The latter involves reformulating 
a certain task to be accomplished in terms of a control 
optimization problem for a suitable objective functional. 

One approach to solving the resulting control optimiza- 
tion problem is direct closed-loop laboratory optimiza- 
tion, which involves experimental evaluation of the objec- 
tive function (see e.g. [36l[32]). This approach has been 
applied to a range of problems, and in some settingts, 
e.g., in quantum chemistry, where high fidelities are not 
critical and the estimation of expectation values for large 
ensembles is fast and inexpensive, this approach is both 
feasible and effective. In other situations, however, in 
particular for complex state engineering and process op- 
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timization problems, each experimental evaluation of the 
objective function may require many experiments and 
expensive tomographic reconstruction, making it highly 
desirably to have pre-optimized control designs based on 
a model of the systems, even if imperfections in the model 
might necessiate a second stage of adaptive closed-loop 
optimization to fine-tune the controls. 

Model-based optimal control relies on solving the re- 
sulting control optimization problems using computer 
simulations of the dynamics and numerical optimiza- 
tion algorithms. Efficiency is crucial as solving the opti- 
mization problem requires the numerical solution of the 
Schrodinger equation many times over, which is gener- 
ally expensive for realistic models and practical problems, 
and the amount of simulation is required is therefore a 
main limiting factor determining which physical systems 
the technique can be applied to. In this context two main 
strands of optimization algorithms can be distinguished, 
namely, concurrent-in-time and sequential-in-time. The 
first can be readily understood using general results from 
numerical analysis and optimization theory. The second, 
motivated by control of dynamical systems, is often for- 
mulated in terms of iterative solution of a set of Euler- 
Lagrange equations. Despite being widely used to solve 
control optimization problems for quantum dynamics in 
the aforementioned applications, its algorithmic perfor- 
mance does not have such a well established theory, and 
many key issues such as its convergence behavior, the ef- 
fect of discretization errors and optimal update formulas 
have not been extensively studied. This motivates us to 
address these issues in this article. 

Although the theory can be generalized to open 
systems governed by both Markovian [38^ and non- 
Markovian dynamics [39] , our analysis in this article shall 
focus on Hamiltonian control systems, i.e., systems whose 
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evolution is governed by a Hamiltonian _fff dependent on 
external control fields f = t{t), which determines the 
evolution of the system via the Schrodinger equation 

^Uf{t)^~i.HtUdt), C/(0)-I, (1) 

where U{{t) is a unitary evolution operator and I the 
identity operator on the system's Hilbert space H, which 
we assume to have dimension N < oo here. The evo- 
lution of pure-state wavefunctions |^o) G H is given by 
|*f(i)) = Uf{t)\'^o), and for density operators pn (unit- 
trace positive operators on H representing mixed states 
or quantum ensembles) by pf{t) = Uf (t) poUf {t) , U^{t) 
being the adjoint of Uf{t). In this semi-classical frame- 
work the control f(i) is a classical variable representing 
an external field or potential such as a laser, microwave 
or radio-frequency (RF) pulse or an electric field created 
by voltages applied to control electrodes, for example. 
The dependence of the Hamiltonian Hf on the control 
can be complicated but often it suffices to consider a 
simple form such as a linear perturbation of the Hamil- 
tonian, Hf = Hq + f{t)Hi, for example. Although at 
any fixed time t the Hamiltonian depends only on a sin- 
gle parameter /(t), if Hq and Hi do not commute, i.e., 
[Ho,Hi] 7^ 0, varying a single control over time enables 
us to generate incredibly rich dynamics and effectively 
provides us with an unlimited number of parameters, a 
powerful idea, which is developed within the theory of 
non-linear open-loop control. We can exploit this free- 
dom to manipulate the dynamical evolution of the system 
to suit our needs and achieve a desired goal. The specific 
control objectives are as varied as the applications, and 
many different types of control objectives have been con- 
sidered, but most fall into one of the following categories. 

State-transfer problems involve designing a control 
f to steer a system from an initial state to a target state 
and come in two flavors, pure- and mixed-state trans- 
fer problems, formulated in terms of wavefunctions j^*) 
and density operators p, respectively. For optimal control 
purposes they are usually formulated in terms of maxi- 
mizing the overlap with a desired state |$) or cr, or so- 
called transfer fidelity 

5i(f) = 5i($|^f(T)), (2a) 
;?2a(f) = Tr(apf(T)). (2b) 

Maximizing the transfer fidelity is equivalent to mini- 
mizing the distance d{A, B) = ||j4 — _B||5 induced by the 
real Hilbert-Schmidt inner product {A\B) = 3fiTr(Al'B) 
for operators A, B on Ji, which for wavefunctions can 
be expressed as in terms of the standard inner 

product, and we can equivalently express the problem in 
terms of minimization of an error functional 

(£i(f) = i||*f(T)-ci>||2 = i-;?i(f), 

€2{i) = ^\\pf{T)-<j\\l=E,-d2a{f), 

where the constant Eq = Tr(pQ) + Tr((T^) takes into ac- 
count the conservation of the linear entropy under uni- 
tary evolution. 



Trajectory optimization problems have a similar 
flavor but instead of minimizing the distance from a de- 
sired state at a final time T, we aim to minimize the 
distance of the system's trajectory from a target trajec- 
tory |$d(i)) or ad{t) over time, leading to an objective 
functional of the form 

«3(f) = i /o1l*f (t) - mm dt = T- Ui), (3a) 

(£4(f)- i/olpf(t)-Td(i)|||dt = ii;o-;?4a(f), (3b) 

with S'3(f) = ^^^{<^d{t)\'^f{t)) dt for normalized wave- 
functions l^'f(i)), and 5'4Q(f) — J^TT{ad{t)pf{t)) dt and 

Eq — ^ Tr(pQ) + Tr((T^(t)) dt for density operator tra- 
jectories. 

Observable optimization problems involve optimiz- 
ing the expectation value of an observable Q (Hermitian 
operator on H) either at a final time T or over time [0, T], 
and also come in pure and mixed state versions, leading 
to the respective target functionals to be maximized 



55(f) - (f f(r)|Q|*f(T)), (4a) 

56(f) = /o'(*f(i)|g(t)|*f(i))dt, (4b) 

52(f) =Tr(Qpf(T)), (4c) 

54(f) =/o'Tr(Q(i)pf(i))di. (4d) 



The last two of these are just simple generalizations of 
52a and 54a since density matrices are Hermitian opera- 
tors on H. Observable and trajectory optimization prob- 
lems involving linear combinations of various objectives 
can obviously also be considered. 

Unitary gate optimization problems involve min- 
imizing the distance from a target gate V G U(A^) or 
equivalently maximizing the gate fidelity 

57(f) = mr{V^Uf{T)) ^N- i||C/f(r) - v\\l. (5) 

Unitary gate optimization problems and pure-state trans- 
fer problems can be formulated using absolute values in- 
stead of the real part 

5i.(f) = |(<f>|vI'f(T))p, (6a) 
57b(f) = |Tr(FtC/f(T))|, (6b) 

which corresponds to optimizing the overlap with the tar- 
get state or gate modulo a global phase factor, which is 
usually irrelevant. Mathematically, this corresponds to 
restricting the state space to CP^^^ instead of unit vec- 
tors in "H or the projective unitary group PU(A^) instead 
of U(7V). 

Historically, the first objective considered in this con- 
text was 55 [iQ], with further developments for this case 
carried out in [IT] and [12]. In the same series of papers, 
[15] considers 5if) and [H] introduces 52 in the context 
of general dissipative evolution. The method was applied 
to gate problems, using 5/ in [45| and extending this to 
(the square of) 5/6 in [3B]. Later ^Tj considered the two 
quite general types of objectives 5i + 53 and 55 +56- 
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While the objective functionals defined above corre- 
spond to many different control problems, a commonality 
between all of them is that they are simple functionals 
of the evolution operator Uf{t), in fact, with the excep- 
tion of ^Th, all target functionals are linear or bilinear 
functions of Uf{t), and therefore many properties can be 
derived from analysis of the latter. 



II. ITERATIVE SOLUTION SCHEMES FOR 
CONTINUOUS-TIME CONTROLS 

The optimization problems defined in the previous 
section usually cannot be solved directly by analytical 
means. Practical schemes for finding solutions therefore 
generally involve iterative algorithms. A large class of 
iterative solution schemes that have been proposed for 
problems of the types above can be described in simple 
terms by considering a pointer p moving back-and-forth 
within the time interval [0,r], overwriting the value of 
the field at that point. More specifically, one usually 
starts with an initial trial field i{t) = (/i(t), . . . , fM{t)), 
and then sweeps p forward through the whole time inter- 
val, while updating the value of fm{p) to 
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where 77 is a suitable real parameter and Wm (p) a- suitable 
real weight function. Here the limit in question is taken 
from the right and the functional derivative of ^ with 
respect to the field, jf^^, is evaluated at the current 
field f and in direction of the delta mass at p. Then 
p is swept backwards through the time interval, while 
updating /^(p) to 
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r/) lim 



(8) 



Wrnip) Sf„i{p) 

with the limit now taken from the left. This forward 
and backwards sweeping is repeated until some form of 
convergence is achieved. Of course, we can equally well 
start with the backward sweep. 

The notion of "overwriting the field" is made math- 
ematically rigorous by introducing two fields g and g, 
which are solutions to initial and final value operator dif- 
ferential equation problems, respectively, and taking the 
actual field f to be equal to g on the interval [0,p) and g 
on {p,T]. For example, in the gate optimization case we 
iteratively solve the initial value problem 

ihu^'^'>{p) = ij(g("))c/(")(p), c/(" 



g(")(p) = (l-77) 



ip) 



and the final value problem 

ihv^'^\p) = ff(g("))y(")(p). 



g(")(p) = (i-77')g^") 



r] 



Wm{p) Sfm{p)' 

T/(")(T) = V, 



Wm{p) Sfjnip)' 
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(9b) 
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FIG. 1: Convergence behavior of error S'"^ = 1 — 5'"^ of 
actual objective, and regularized objective -j'"^ for problem 
1 detailed in the appendix with finite time step A = 0.01 
for different values of A shows that monotonic increase of the 
regularized objective is not guaranteed in the discrete case if 
the weight of the penalty A is too small. (Initial field f = 0, 
r; = 1, 77' = 0, i.e., standard PK-Krotov-update, average cost 
per forward/ backward sweep/iteration ~ 10.4 seconds.) 



starting with some initial trial field g'^-*. 

There is a subtlety in the cases involving pure-state 
observables, which are not immediately reducible to an 
ODE formulation. These cases are usually dealt with by 
reducing them to pure-state transfer or trajectory track- 
ing problems with a target state ($| = (^o|C^g(^)^'5 or 
trajectory ($(t)| = {'^o\Ug{tyQ{t), where g here is the 
state of the field the last time the pointer hit p — T. 
The updates of (<I>| or (<&(i)| when updating g at p = T 
can only increase the fidelity, in the pure state observable 
case, due to the inequality 



e™|C/f(T)|^'o) 

.|C/f(r)|*o> - 



~m^oid\Ui{T)\^a)) > 
23fi(($„irf|C/f(r)|*o)) 

+(<f>„zrf|;7g(r)|*o> = 

(vtol [[/f (T) - (7g(T)]^ Q [Uf{T) - {/g(T)] |*o) > 



and similarly for the tracking problem, where the current 
field f is used to calculate {^new \ and g is used for {^oid\- 
For the typical choices of the objective functional ^ 
listed above, this scheme can be shown to yield a mono- 
tonic increase of the regularized objective function 



m 



I fm 1 1 to 



m)-m, e:(f) = iE„Jl/..ll 

\frn{t)\^W^{t) dt 



2 

w ' 



(11a) 
(lib) 



between sweeps (and in fact throughout progression of 
the pointer p) for any positive functions Wmit) and any 
choice of 77, 77' e [0, 2] As ^{f) is uniformly bounded, 
the sequence 3^"^ is bounded and must converge to some 
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value Z*- However, the best we can hope for is conver- 
gence to a critical point of 3(f), i.e., a point for which the 
variation of 3(f) with regard to independent variations of 
the fields f,n{t), 



suit) 6Uit) 



- fm{t)Wm{t) 



(12) 



vanishes, which happens if fm{t) = '^^^^rij) IJ^^) ■ 
other hand, a necessary condition for the functional S^(f ) 
to have an extremum is 



Vm. 



(13) 



This shows that this update scheme generally will not 
converge to a critical point, much less extremum, of the 
actual objective 5^(f), as the only critical point shared 
by the actual objective and the regularized objective 
3 is the trivial solution f = 0. In fact, convergence in 
the stronger sense of convergence of the field iterates f 
to some field f* that satisfies the critical point condition 



(12 1 does not follow trivially, and has only been shown 



under certain conditions such as sufficiently large penalty 
terms [IB], for which the resulting converged fields tend 
to be far from the global optimum of the unpenalized 
objective function, as can be seen by comparing the re- 
spectively critical point conditions (12) and (13). 

Furthermore, numerical solution of the initial and fi- 
nal value problems generally requires some form of time 
discretization. For a fixed time step At the change of 
the regularized objective J depends on the choice of the 
weights Wm of the penalty terms and monotonic increase 
is not guaranteed. When the weights are too small rel- 
ative to the time step At, the changes in the field am- 
plitudes can become too large and as a result 3 may de- 
crease. An example of this behavior is shown in Fig. [l] 
for a model system consisting of a linear arrangement 
of five qubits with uniform, nearest neighbor Heisenberg 
coupling subject to a fixed global magnetic field in the 
x-direction and five local Z-controls, one for each qubit, 
leading to a Hamiltonian of the form 



N 



(14) 



n— 1 n— 1 



^ (n) 

where (Ti etc are the usual tensor products of Pauli 



matrices, e.g., a. 



(1) 



/ and (Tt, (T„ 



and Oz the the standard Pauli matrices and / the 2x2 
identity matrix. For the following simulations we choose 
the constants £7 = 10 and J = 1. 

The convergence plot shows the value of the actual ob- 
jective S'^") = ^3?Tr(V"tc/(")(T))^ the unit gate fidelity, 

and the regularized objective 3'"-' = 5^'"-* — C*-"-* with the 
penalty term C^") = | J^m Wfm^Ww '^ith uniform weights 



Wmit) = A, as a function of the iteration number n for 
At = 0.01 and different values of A. For the smaller val- 
ues of A, after increasing monotonically for a few 
dozen iterations, starts to decrease as the cost term be- 
gins to dominate. One would usually terminate the algo- 
rithm at this point but forcing it to continue shows that 
despite the decrease in the value of the actual objec- 
tive functional 5^^"^ continues to increase. Monotonicity 
of 5'"-' can be recovered by increasing the weight of the 
penalty but at the expense of very low final fidelities, 
which casts doubt on whether optimizing a regularized 
objective is sensible at all if the real objective is to maxi- 
mize the fidelity. This problem has also been observed in 
the literature and as a remedy, a modified version of the 
regularized objective function C5'(f) — S^(f) — ^2{f) with 
a cost term 



^m\\w 



(15) 



has been employed [IP]. It can be shown that if the 
"reference functions" am{t) are chosen to be the values 
of the fields fm{t) in the previous iteration in the iterative 
update scheme above, giving rise to a dynamic cost term 



4">(f) 



fM _ f 



("-1)1 



2 

w ' 



(16) 



then we obtain the modified update rules for the forward 
and backward sweep 

lim /.„(t)+ ,/ff\ (17a) 

lim/„,(t)+ (17b) 

which lead to a monotonic increase in the actual ob- 
jective for ?7, rj' > 0. If the sequence of field iter- 
ates f converges to some field f* then we further have 

II /n"^ — fm ^^11 0, i.e., the cost term vanishes in the 

limit, lim„_i.oo 6^2"^ — 0' thus the resulting field f* ap- 
proaches a critical point of actual objective 5'(f). How- 
ever, such convergence is not guaranteed. 



III. OPTIMIZATION OF OBJECTIVE 
WITHOUT PENALTY 

The previous section shows that optimizing a regular- 
ized objective with a fixed cost term is problematic in 
that (i) the critical points of the regularized objective 
differ from those of the actual objective and (ii) in the 
discrete time setting even monotonicity cannot be guar- 
anteed. The former issue can be addressed by introducing 
a variable penalty term but at the expense of compound- 
ing technical problems about monotonicity and conver- 
gence. We shall now show that the introduction of a reg- 
ularized objective is unnecessary, and in fact undesirable 
in particular in the discrete time setting. 
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As observed above, a necessary condition for the fi- 
delity 5^(f) (infidelity or error £(f)) to assume a maxi- 
mum (minimum) at some f = f.^ is that f* be a critical 
point of 5^(f), i.e., that the variation of 5^(f) with respect 
to f vanish for f = f*. As all the objective functionals 
defined above are simple functionals of the evolution op- 
erator Uf{t), their respective critical points are defined 
in terms the critical points of Uf{t), which can be found 
by rewriting the Schrodinger equation in integral form 

Uf+Afit)-Ufit)^~i f Ufit,T)AH{T)Ur+Af{T)dT 

Jto 

as can be verified simply by differentiating both sides, 
with AH{t) = iJf+Af('r) - Hfir) and U{t2,ti) = 



U{t2)U{tiy. Iteratively substituting (f8) into itself 
yields a perturbative series expansion similar to a Taylor 
expansion for ordinary functions 



f/f+Af (t) - Uf{t) - i f Ufit, T)AH{T)Ur{T)dT 

^ JJ Ut{t,T2)AHir2)Uf{T2,n)AH{Ti)Utin)dndT2 

0<Ti<T2<t 

+ higher order terms. (19) 



The (n -I- l)th term in the expansion (19) 



i-^r J -J C/f(t,T„)Ai/(T„)C/f(T„) 

0<Ti<-<r„<t 

• • • [/f (ri)t Ai7(ri)[/f (Ti)dri • • • dT„ (20) 

can be bounded by ^||AiJ||", where ||Ai/||i refers to 
II Ai7(r)||(iT for any chosen unitarily invariant norm. 
This shows that the maps f i— >■ Uf{t) have functional 
derivatives of all orders for all t and are in a well-defined 
sense real analytic. By the polarization method, these 
expansions determine all mixed (functional) derivatives 
of both Uf{T) and C/f , with the n*'' order derivative in 
directions fi, . . . , f„ bounded in norm by C" Hfe^i l|ffc|l i 
in both cases. 

Assuming f — (fiit), . . . , /m(^)) is a vector of indepen- 
dent controls fm{t) in a suitable function space such as 
L'^[0,T] for g > 1, for example, defining the local vari- 
ations of the Hamiltonian with respect to the controls 
Hm{p) = fr^^fy' that the variation of the evo- 

lution operator with respect to local variations of the 
controls are given by 



SUfjt) 



-iUf{t,p)H„,{p)Uf{p), 



(21) 



from which we can effectively compute the variations of 
all of the objective functionals defined in the previous 



section, giving for instance. 



<5g4(f) rTr 



rTr{Q{t)U,{t,p)[p,{p),iHra{pm{p,t))dt 



Jp 

§^ = J^2-s{^,it)\Q{t)U,it,p)H„,ip)\^,ip)) dt. 

Given any ^ that is a sum of multi-linear terms in 
Uf{T) and C/f that are bounded in all entries of C/f, we can 
easily devise schemes that lead to a monotonic increase 
of 5^. Let 6 be a positive function that vanishes outside 
the interval [—1, 1] and integrates to 1, and let bs^p{x) = 
b{{x — p) / s) be the rescaled version of b centered at point 
p. Assuming f (p) bounded about p and s < p,T — p, the 
gradient of Uf{T) with respect to 6s is 



p+s 



UfiT,T)bs,p{T)H^ip)Uf{T)dT 

- -isUi{T,p)H^{p)Ui{p) + Ois") (23) 



since C/f (t) is locally Lipschitz ai t ~ p, where the error 
term is bounded independently of b. Under the same 
assumptions, the gradient of C/f (i) in direction bg is the 
function of t 



-isUf{t,p)H^{p)Uf{p) t>p ,2^, 
otherwise ^ ' 



up to an O(s^) error, as measured in any L"^ norm with 
q < oo. Applying the product rule to our general ^ with 
the approximations (23) and (24), dividing by s, then 
letting s — 0, shows that the value gf^^f^p'f is always well- 
defined and yields an expression for it. The exact sense in 
which this quantity matches the derivative of ^ evaluated 
at a (5-mass is that & is a positive mollifier. Denoting the 
addition of abg to the mth field /,„ by i+abslm, a Taylor 
expansion to first order gives 



d{f+abXn) = m 



as§^+Ois') (25) 



for each a. This shows that for any a of the same sign as 
s^f^lp) ' a.ssuming the former is non-zero, there is a suf- 
ficiently small scale s for which adding abg to the field 
leads to an increase in ^. Thus, if we add a sequence 
of such increments afos.pfc centered at different times pk, 
the corresponding sequence ^J'^'"'^ will be monotonically 
increasing. For a fidelity function 5, which is a continu- 
ous function from the compact domain U(iV) to M, the 
range of 5' is bounded. Thus S"'*^' is a uniformly bounded, 
monotonically increasing sequence, i.e., convergent. Con- 
vergence of ^^^^ to some value iJ* is not a very strong 
property, however; what is more interesting is whether 
we converge to a global maximum of the fidelity, or equiv- 
alently, whether the infidelity/error £ goes to 0, and the 
rate of convergence. 

Using this update rule, making bigger changes to each 
fm in its gradient direction should in principle offer larger 



6 




0.1 1 10 

Search Length 



FIG. 2: Static (red, increasing) and stateful (blue, decreasing) 
upper bounds on the asymptotic rate of convergence for back- 
and-forth sweeping in the continuous limit s — >■ for a single 
control field show that there is an asymptotic rate that cannot 
be exceeded no matter what search length q is used. 



increases in the fidelity 5^(f) for the same displacement 
incurred by the pointer p. On the other hand, we can 
expect larger choices of a to directly lead to higher am- 
plitudes of f , which in turn induce more oscillation in 
Uf, the gradients sf^H) ^^'^ therefore in the fields. So it 
would seem that in choosing a, we are making a trade- 
off between high amplitude and oscillation in the fields, 
which are generally undesirable features, and fast con- 
vergence of the algorithm. To make this intuition about 
speed rigorous, we can prove that there is an upper bound 
on the instantaneous rate of convergence which increases 
with the search length a and tends to zero as a van- 
ishes. However, this is a static analysis that cannot 
take into account the evolution of the algorithm. Once 
we move to a more sophisticated 'stateful' analysis, we 
can derive a different bound on the asymptotic rate of 
convergence which is actually decreasing in the search 
length a. The resulting bounds on the asymptotic con- 
vergence rates in the continuous limit are shown in Fig. [2] 
The non-trivial stateful bound in particular shows that 
larger search lengths, which are bound to result in greater 
speeds towards the start of the algorithm, will tend to 
slow down optimization in the long run, at least past a 
certain value of a. 



IV. FIELD DISCRETIZATION AND 
PRACTICAL OPTIMIZATION SCHEMES 

The previous section shows that we can in principle 
easily monotonically increase the objective function by 
iteratively updating the controls in a sequence of for- 
ward and backward sweeps. In practice solving the opti- 
mal control problem numerically requires discretization 
of time. This is often done implicitly but we shall see that 
the choice of discretization restricts the class of mono- 



tonic schemes that are available compared to the con- 
tinuous case. It is therefore desirable to explicitly dis- 
cretize the problem and derive optimization schemes for 
this case. We can do this by choosing a basis function b, 
with a step size s and a set of positions pk- The most 
common choice is to partition of the interval [0, T] into a 
fixed number of intervals of size At = T/K and restrict 
the controls fm{t) to be piecewise constant functions 

K 

frn{t) ^^fmkXk, (26) 
k=l 

where Xk is the characteristic function of the interval 
Ik = [tk-i, tk), to = and tx = T. In this case we have 

Ui + «XfeI™) = m + + 0{a'At% (27) 

Ojmk 

where Im indicates that we are adding the basis function 
to the field /™(t). For a given fixed At the 0{a'^At'^) 
term in Eq. ( |27[ ) need not be negligible and a must be 
chosen carefully to ensure 5'(f -I- axk^m) > 5^(f)- In the- 
ory choosing the time step At and search length a small 
will ensure 5^(f + axk'^m) > 5^(f): i-6-, a monotonic in- 
crease in the objective functional, but this will result in 
tiny increases in the objective function in each step and 
thus slow convergence. For the discretized version of the 
problem one can actually prove stronger convergence re- 
sults in the sense that under relatively mild conditions, 
the sequence of field iterates f^"-* must either converge 
to a critical point f* of the fidelity function or diverge to 
infinity (see appendix [b| . 

A. Time Resolution and Gradient Accuracy 

The first critical choice we have to make is the time res- 
olution or time step At, which effectively determines the 
finite-dimensional subspace of the infinite-dimensional 
function space L'^[0, T] we choose to restrict the optimiza- 
tion to. Considering that we started with a continuous- 
time optimal control problem, it may seem natural to 
choose small time steps At to approximate the contin- 
uous case, and it is certainly true that choosing At too 
large may prevent us from being able to reach high fideli- 
ties by restricting the search space too much. In general, 
a minimum requirement for controllability is that the di- 
mensionality of the search space M x K he no less than 
the dimension of the state space. For optimization of five- 
qubit unitary gates, for example, the state space is the 
unitary group U(32), which has dimension 32^. Thus 
we will require at least MK > 1024 and higher time 
resolutions may be necessary to achieve high fidelities, 
although the fidelities considered satisfactory in practice 
may be low in this context. But aside from such con- 
siderations, small time steps are not necessarily a good 
idea. Simple analysis shows that the computational cost 
per iteration for a fixed problem and system size N is 



7 



determined by the number of time steps K, and thus 
A< = T/K if the target time T is fixed. Higher time 
resolutions, i.e., smaller Ai, are therefore computation- 
ally more expensive. Of course, this cost may be offset 
by larger gains per iteration. However, we shall see that 
the discrete rate of convergence is always of order Ai, 
and for unitary gate optimization problems with fixed 
proportional search length the upper bound on the rate 
of convergence is of order At^, for example, suggesting 
that larger /S.t may actually facilitate faster convergence. 
Another reason why very small time steps are often unde- 
sirable are the characteristics of the fields, e.g., to avoid 
excessively complex and noisy solutions. 

Choosing larger At requires careful reconsideration of 
the gradient computation, however. Assuming for sim- 
plicity that the total Hamiltonian is linear in the controls 



M 



(28) 



with time-independent Hm, we have SH[i]/Sfm = H„ 
and Eq. ( 19 ) gives immediately 



dUtjT) 



mk 



Uf{T, tk)J7nkUf{tk, to) 



'J mk 



Ut{tk,t)i^iHrn)Ufit,tk)dt, 



(29a) 
(29b) 



tk-l 



from which we can calculate the various gradients, af- 
ter setting {^f{tk)\ {^\Uf{T,tk) and Qf{tk) := 
Ui{T,tkVQUf{T,tk), as 



df^k 

dfmk 
dfmk 



3?($f(tfc)|J™fc|*f(tfc)) 

25R(($f(tfc)|J™fc|*f(tfc))(^'f(tfc)|$f(<fc))) 

Tr(Qf(tfe)[J„ifc,pf(tfe)]) 

2K(*f (T) I Qt/f (T, i fe ) J„fe I *f (tfc )) 

^TT{V^Ut{T,tk)JrnkUf{tk)) 



§^=^ {TiiV^UfiT,tk)J.nkUf{tk)) 



Tr(yt[/f (T)) 
|Tr(Vt(7f(T))| 



For small At the integral defining J^k can be approx- 
imated by replacing the integrand by its value at some 
point, e.g., the right endpoint 



J, 



mk 



^iH„,)At. 



(31) 



The accuracy of this first-order approximation depends 
on At and the eigenvalues of the total Hamiltonian 
H{{) ^ Hq + Y,m f'n{t)Hm, i-B., the approximation is 
liable to break down when norm of any of the Hamilto- 
nians Hm is large, or if the field amplitudes become too 
large unless very small time steps are used. For piecewise 
constant controls H[i{tk)] is constant on the interval Ik, 
and it is easy to derive an exact formula for the gradient 
by evaluating the integrals Jmk analytically, noting that 



Uf{tk,t) = e(*'=-*)^, and thus 

J,nk= / e^^(-ii/„Oe~"^dr 

where B = —iH{i{tk)) and 

n=0 ^ ' 

For Hamiltonian systems 7(2) can be evaluated via the 
eigendecomposition of the skew-Hermitian matrix B = 
VAV''^, where A = diag(Ar) and purely imaginary. 
This allows us to evaluate 7(At adB){~iHm), noting that 



N 



Jrnk = ^ \v7-){Vr\Jnik\Vs){Vs 



(34) 



where {vr\Jmk\vs) are determined by 7(2) evaluated at 
the eigenvalues of the adjoint ads times At, which are 
determined by the differences of eigenvalues A^. of B, 

LOrs = Ar — Ao! 



{vr\Jmk\vs) = 7(a;,.sAt)(u,. 



(35) 



The first order approximation j{uJrs^t) = 1 is off by close 
to min{At|cjrs|/2, 1}. Thus, to ensure that the standard 
approximation is reasonably accurate we require At < 
II ads 11""'^ where ||A|| is the operator norm, max|eig(^)|. 
In problem 1 it can be shown that || ads || ~ 100, with the 
dominant contribution being Hq, which shows that for 
At = 0.01 standard approximation is over 95% accurate 
while for At — 0.1, x can be as large as 10, i.e., jiix) 
is far from 1 and the error in the gradients is huge, and 
we are in fact performing more of a random walk than a 
gradient search! A histogram of the actual distribution 
of the gradient overlaps 



exact 



|V?e 



|V?a 



(36) 



for problem 1 in Fig. [4] confirms this, showing that the 
first order approximation is excellent for At = 0.01 — the 
distribution is narrow with a median overlap of 99.77% 
— while for At ==0.1 the distribution is broad with a 
median overlap of only 41.80%. 

Alternatively, it can be verified by induction that 

(AtadB)"i/,„ - E ( fc ) Ai"i?""'=i^m(-5)' (37) 

k=0 ^ ^ 



leading to 



At" 



-t{AtadB)H^ = J2—^J2 
^-^ n + 1 ^-^ 

n=0 k={) 



B^-^H^i-B)' 
{n-k)\k\ 



(38) 



One possibility to evaluate this series is to truncate the 
infinite sum at some A'^ — 1 > 1 and invert the order 
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FIG. 3: 7(42:) shows that |7(ia;)| is close to 1 for \x\ < 1 but 
quickly drops off for larger x leading to large errors in the 
standard gradient approximation. 
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FIG. 4: Histogram plot of overlap of first order gradient 
approximation with the exact gradient for problem 1 for 
At = 0.01 and At = 0.1 (red lines: medians). 



for n = 0, . . . , — 1. This relation uniquely determines 
the rj as the nodes, and aj as the corresponding weights 
of Gauss-Legendre quadrature over the interval [0, 1]. 

The series expansion is preferable to the eigendecom- 
position for pure-state or state vector optimization prob- 
lems; in particular for a fidelity function such as 
■Sib or 3^5, whose gradient in step k is expressible in 
terms of the real inner product of the state l^'f(ifc)) 
and some vector ($|, as ($| Jmfcl^f (^fc))- This expres- 
sion can be approximated to order N by first comput- 
ing B'^l^'f (tfc_i)) and ($f (tfe_i^'= for_^= l,...,N~l 

These pro- 



(391 



and then applying either of ( 38 1 or 
cedures use only matrix-vector or vector-vector opera- 
tions, avoiding the need for relatively expensive matrix 
multiplications and an eigendecomposition. Specifically, 
Eq. ( [38| needs MN + 2N — 2 matrix-vector products 
and N({N + l)/2 + M) vector operations to evaluate 
for all controls, while Eq. (39) achieves the same or- 
der of approximation with only MN /2 + 2N — 2 and 
{N + M)N/2 operations respectively. The series expan- 
sion therefore can be applied to state transfer problems 
for high-dimensional systems with sparse Hamiltonians, 
where the eigendecomposition is infeasible. However, ad- 
ditional measures such as scaling and squaring may be 
needed for larger time steps to ensure that the series ap- 
proximation can be truncated for small N . Another al- 
ternative are finite difference approximations to estimate 
the gradients. These issues are further explored in the 
context of spin dynamics for large-scale systems in [SD] . 

We can also derive an efficient series approximation 
for density matrix and unitary gate optimization prob- 
lems with fidelity functions 5^2, 5^7 or ^-ji,. The gradi- 
ents on step k of these fidelities can all be expressed 
as Tr(AJmfe) for some skew-Hermitian matrix A, which 
is [pf{tk)iQi{tk)\ for ^2, and the skew-Hermitian part 
(X - ) /2 of X = C/f (t fe ) y t [/j. (T, ) for 5^7 or the phase 
multiple of X for ^•jb- Using the fact Tr(M^[X,y]) = 
Tr([W^, iteratively, we can re-write Tv{AJmk) as 



of summation to reduce the number of required matrix 
products to 3A^ — 4. Any such sum of the first N terms of 
the series yields an approximation to the exact gradient 
expression of order N in At. For N even we can reduce 
the number of required products to 2A^ — 2 by summing 
A^/2 terms of the form 



V fe = 



> fc=0 



fc! 



(39) 



for suitable choices of Oj , Vj . To recover the first N terms 
in the series, the coefficients aj,rj must satisfy 



N/2 



1 



x" dx 



(40) 



Ar 



Kn=Q 



(n + 1) 



yad!!B(A)ziJ„ 



At 



(41) 



which when truncated to its first TV terms, requires A^ — 1 
matrix multiplications (plus N + M negligible element- 
wise matrix operations) to evaluate for all controls, not- 
ing that the commutator [X, Y\ of skew-Hermitian ma- 
trices X, Y is the skew-Hermitian part of 2XY . Formula 



(351, however, has the advantage of being exact, and its 
numerical cost being independent of At. Furthermore, 
once the eigendecomposition of H[f{tk)] is known, the 
computation of the evolution operators C/f(tfe,^fe-i) be- 
comes trivial. This makes it suitable — and in many 
cases probably preferable — for use in density matrix 
and unitary gate optimization problems. 
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FIG. 5: Evolution of fidelity (first 10 iterations) for forward- 
only (top) forward-backward (middle) and split (bottom) up- 
date for At — 0.01. For continuous forward-backward sweeps 
the gradient quickly becomes very small and the fidelity barely 
increases after a few iterations. For the forward-only and split 
update the switch-overs enable revival of the gradients and 
thus the rate of fidelity decrease, thereby accelerating conver- 
gence. 
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FIG. 6: Convergence behavior for forward-only, forward- 
backward and split update for At — 0.01 and a = 5000 and 
At = 0.1 and a = 200 shows that forward-only update is 
preferable to back-and-forth update and split update is even 
better. The differences decrease with larger time steps. 



B. Variants of Sequential Update 

The straightforward discrete analogue of the contin- 
uous forward and/or backward sweeping is sequentially 
updating the fields i{tk) for k = 1 to k = K (or k = K 
to fc = 1) according to the rule 



f("+i)(ife) = f(")(tfc)-t-aVfey(f(")), 



(42) 



where VkS = ( gf ^ )m=i is the gradient vector at time 
tk, and iterating the procedure, starting with an initial 
trial field f as before. As in the continuous case, we 
have the option of sweeping forward and backward, con- 
tinually updating the fields in both directions, as e.g., 
in [IT] and the generalized scheme proposed by [H], or 
updating the fields only in one direction, usually the for- 
ward sweep, as in the PK-Krotov version of the iterative 
update scheme discussed in Sec. |lTj where 77 = 1 and 
77' = 0. Intuitively one might expect that updating the 
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fields in both directions should accelerate convergence, 
but convergence analysis shows that this is not the case. 
Continually updating the fields in both directions in fact 
reduces the asymptotic rate of convergence. 

This is illustrated in Fig. |6j which shows that the 
asymptotic rate of convergence is substantially greater 
for the forward-only update than for forward-backward 
update, in particular for small At, i.e., updating on the 
forward sweep only is preferable to updating on both 
sweeps. This can intuitively be explained as follows. The 
sequential update scheme aims to minimize the gradient 
at each time step. By continuity, minimizing the gradient 
at time tends to decrease the gradient at the subse- 
quent time step tk+i- As the magnitude of the gradient is 
the main factor limiting the increments in the fidelity at 
each time step, we quickly reach an asymptotic regime. 
If we only update in one direction, however, there is a 
step jump at every iteration, whenever we switch from 
t = Ik to t — ti, which allows the gradient to rebound, 
facilitating larger gains in subsequent steps. 

These observations suggest that updating the fields in 
a strictly sequential manner is not the optimal strategy, 
and that convergence might be accelerated by introduc- 
ing step jumps. For example, instead of updating the 
fields sequentially in a single forward or backward sweep, 
we could update half the fields i{tk) in the forward sweep 
and the other half in the backward sweep. We could 
choose the fields to be updated in the forward sweep ran- 
domly or choose to update all odd time slices f{t2k-i) 
in the forward sweep, and the even ones in the back- 
ward sweep. The gains of such schemes per single time 
slice update must be offset against increased computa- 
tional overheads associated with such non-sequential up- 
dates, chiefiy additional matrix multiplications to prop- 
agate the changes. However, there is one simple vari- 
ations of strictly sequential update that can be imple- 
mented without increasing the total number of operations 
(matrix exponentials, matrix multiplications and gradi- 
ent evaluations) per single iteration: if wc update the 
first K/2 time slices consecutively in the forward sweep, 
propagate without updating to k = K, and update the 
second half of the field consecutively in the backward 
sweep and then back-propagate from k — K/2 to fc = 1 
without updating. Fig. |6] shows that this split update 
strategy outperforms the forward-only update although 
the gains are smaller than for back-and-forth update ver- 
sus forward-only update, and the advantage of the split 
update scheme diminishes for larger time steps, not too 
surprisingly, as for fewer and larger time steps the local 
gradients do not die off as fast, and therefore the rebound 
effect induced by the switch-overs is reduced. The cost 
per iteration for all update schemes was about ss 3.05 
seconds for At — 0.01 {K — 3000) using the first-order 
gradient approximation, which is accurate for this time 
step, and « 0.34 seconds for At = 0.1 [K = 300) us- 
ing the exact gradient formula [55]. The total operation 
count (matrix multiplications, propagation steps, gradi- 
ent evaluations) per iteration for K = 300 is approxi- 



mately 1/10 of the number of operations for K = 3000. 
The time per iteration for K = 300 is slightly greater 
than one tenth of the time per iteration for K — 3000, 
3.05/10 = 0.305 < 0.34. The additional cost reficcts 
the fact the exact gradient evaluations are slightly more 
computationally expensive than the first-order approxi- 
mation. 



C. Dynamic Search-length Adjustment 

Another crucial parameter in the gradient-based se- 
quential optimization is the choice of the search length 
a. Fig. [7] shows that the rate of convergence depends 
strongly on a even if all other parameters are the same, 
and we therefore require a way to choose a suitable search 
length based on a simplified local model of the function 
to be optimized. As previously explained, a quadratic 
approximation, say F{a), to the fidelity change 

F{a) = 5(f + aAffe) - iJ(f ) (43) 

in some direction Af^ is often sufficient for the purposes 
of our optimization problems. Such a second-order model 
F is determined by matching two quantities in addition 
to the obvious F(0) = = F(0), for which it is natural to 
choose the derivative f (0) = ^"(0) and value F{aQ) = 
F(ao) at some ap. Assuming that F'(0) = AfJ • VfcS'(f) 
is strictly positive, such as with Af^ = V kS{{) the non- 
zero gradient, all possible F are equivalent up to scale 
(and a sign), so that we can just consider an appro- 
priately invariant quantity e.g. ^ = 1 — -^tj^^- Since 

F{a) = F'{0){a - ^a^), for ^ > then F has its sec- 
ond root at a = so its maximum at a* :— while 

otherwise F is simply unbounded. If the quadratic ap- 
proximation is accurate, (2^)~^ is then close to the factor 
by which we must multiply the current search length ao 
in order to maximize the fidelity gain F in the current 
step. Note that this choice of new search length a^, makes 
sense even in general, since ^ measures the relative error 
in the linear expansion _F'(0)q; of F{a) at ao and shrinks 
or grows ao according to whether and how much this 
error is above or below one half. We are then aiming 
for the largest a for which the gradient based model is 
still relevant, so again the largest reliable fidelity gain, 
although in the general case, when ^"^ is large or nega- 
tive, it is safer to let a* be some fixed multiple, say 2, of 
ao. Fig. [s] though, with F{a) and F{a) computed for a 
randomly chosen time step for problem 1, shows that the 
quadratic model to be an excellent approximation here, 
as the theory would lead us to expect. 

These considerations suggest that the locally optimal 
update strategy is to set the search length a to a* in 
each step. However, this strategy has one problem: at 
the local maximum a* the derivative of the second order 
approximation F{a) vanishes, and thus the derivative of 
F{a) will be approximately zero as well. By continuity 
of the gradient this tends to ensure that the gradient at 
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FIG. 7: Convergence behavior for sequential optimization without penalty (problem 1, At = 0.01, forward-only update) for 
different choices of the search length parameter a shows significant dependence of convergence rate on a (left). The convergence 
plots allow estimation of the "asymptotic" convergence rate as a function of a, shown right for different update strategies. 
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FIG. 8: Actual F{a) and quadratic model F for a randomly 
selected time step tk (problem 1). 



the next time step tk+i will be small, especially for small 
At. Therefore, the most greedy local update strategy 
may not be the best in the long run as it rapidly kills off 
the gradients, thus reducing future gains and decreasing 
the rate of convergence in the longer run. This effect 
will be most pronounced for the back-and-forth update, 
the most greedy update strategy, but it is still significant 
for forward-only and even split update. This suggests an 
alternative strategy for choosing a. Instead of choosing 
a = a^,, we may wish to choose a to be slightly larger or 
smaller than a* . Fig.|8]shows that this will not reduce the 
local gain very much but has a significant impact on the 
gradients. Indeed, Fig. [9] shows that the median fidelity 
averaged over 100 runs using the split-update strategy 
with the most greedy choice of search length a = a* is 



slower in the long run than two variants of search length 
adjustment motivated by the above considerations. Vari- 
ant 3 in the figure is based on a deliberate overshooting 
strategy, choosing a = 1.25q!* in each time step. Vari- 
ant 1 is aimed at slowly changing the search length un- 
til it falls within a desirable range [r'i,r2]Q;* around the 
maximum, and trying to keep it in this range. If the 
current search length a < ria* then we increase a by 
a fixed factor ai, if a > r2a■^, then we decrease a by 
a factor a2- This ensures that whenever a falls outside 
the desirable range, it is increased or decreased in each 
step until it falls back in the desirable range. We chose 
ri = I and ^2 = 1 and ai = 0.99, a2 = l-Ol- The r- 
values are motivated by the quadratic model, while there 
is no simple rule for choosing ai and a2- Our choice 
of factors very close to 1 may appear strange and does 
reduce gains during the first few-hundred time steps if 
the initial choice of a is far from the optimum value, but 
even such a small factor allows a to increase 20-fold in 
a single iteration with K = 300 steps (l.Ol^^^ w 20), 
and gradual changes can facilitate convergence of a to a 
near optimal value, especially for gate optimization prob- 
lems. Fig. [9] suggests that this strategy is clearly better 
than the most greedy one, though slightly worse than the 
systematic overshoot strategy when considering only the 
median fidelity over 100 runs. Systematic overshoot can 
result in large changes in the search lengths especially 
initially, leading to instabilities and occasional failure, 
however. This is evident in Fig. [TOj which shows the 
evolution of the search lengths as a function of the total 
number of time steps executed for three different search 
length strategies. In all three cases the search length a 
quickly approaches a limiting value. The limiting values 
of a for variant 1 and the systematic overshoot strategy 
(variant 3) above are very close, and about 30% larger 
than the limiting value for the most greedy strategy (vari- 
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FIG. 9: Median fidelity for 100 runs as a function of compu- 
tational time for different optimization strategies: sequential 
gradient, sequential Newton and concurrent quasi-Newton up- 
date using BFGS algorithm for problem 1 with At = 0.1 as a 
function of the search length a. 
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FIG. 10: Variation of dynamic search length parameter a 
as the function of the total number of time steps executed 
{K — 300 steps per iteration, At = 0.1, forward-only update) 
for different search length strategies. 



D. Higher-Order Methods 



ant 2). Variant 1, although based on somewhat ad- hoc 
choices, has the advantage of avoiding fluctuations and 
large spikes in the search length, especially during the 
first few hundred time steps when the quadratic model 
may not be very accurate. 

Another reason why avoidance of large changes from 
one time step to the next is desirable is computational 
efficiency. Usually, in optimization problems one would 
apply any change in the search length immediately, but 
this is computationally expensive as it requires the com- 
putation of a matrix exponential exp(— lAti/ [f (tfe)]) for 
the new field i{tk), and the gain of increasing the step 
length for any given time interval is usually small, espe- 
cially when the search length is close to its optimum. To 
avoid such overheads one may choose not to apply the 
search length change at the current step, but only at the 
next time step. This is not too unreasonable as continu- 
ity considerations suggest that the optimal search length 
should not vary too much from one time step to the next, 
and it ensures that the computational overhead of the 
dynamic search length adjustment is negligible and the 
computational cost per iteration is constant as it would 
be for a fixed search length. For sequential optimiza- 
tion over many time slices avoiding the computational 
overhead of multiple fidelity re-evaluations at each time 
step is usually preferable over the small gain achieved by 
applying the search length change to the current time 
interval, and for unitary gate optimization problems in 
particular, the search length usually quickly approaches 
an optimal value and varies relatively little after this, as 



In the previous two sections we have considered chang- 
ing the fields locally in direction Af^ of the gradient 
Vfc5^(f ) using a quadratic model for the fidelity change F 
along that line as a function of the search length a. In 
general though, there is nothing special about the gradi- 
ent direction, unless the local Hessian is a scalar multiple 
of the identity. Thus we should do better if we replace 
the simple gradient update by a Newton update step: 



(44) 



using the full matrix of second order partial derivatives 
Sjk (Hessian), provided that i^^*^-* is strictly negative def- 
inite, given that we are maximizing. If the Hessian Sj^^"^ 
is indefinite or positive definite, the Newton step should 
never be used since it would take us to some irrelevant 
saddle point, or worse the minimum, rather than the de- 
sired maximum, of the quadratic model. In these cases, 
the quadratic model has no unconstrained maximum, 
and as it is anyway only accurate in a neighborhood of 
the original f^, a trust-region method (see Appendix [A|, 
which restricts attention to suitably small changes in f^, 
should be used. 

For sequential update the Hessian matrix for the fcth 
time interval is Sj 



(k) 



shown in Fig. 10 



— at — , which is an M X M 
matrix, M being the number of controls. We can easily 
derive an analytic expression for it from Eq. (19); taking 
^ = ^^TiiW'fUtiT)), for example, we obtain 



iSRTr (w^UriT,tk)J^J:lUf{tk-i,0)) , (45) 
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FIG. 11: Accuracy of scalar Hessian approximation as a 
function of the optimization time steps. The relative error 
— II , where P is the average of the diagaonal ele- 

ments of i5, is quite large initially but the off-diagonal ele- 
ments of quickly die off, and after only 3000 time steps (10 
iterations a.t K = 300 time steps per iteration), the relative 
error of the scalar approximation is on the order of 5%. 



where Jmn is the double integral 

Jinl= J J Uf{tk,T)HrnUt{T,a)HMa,tk-l)dc7dT. 

tk-i<cr<T<tk 

(46) 

For piecewise-constant controls f (it) we can again evalu- 
ate this expression exactly. Indeed, if iH[i{ti.)] — VAV^ 
is an eigendecomposition of iH[f{tk)] and \vr) are the 
columns of V and the corresponding eigenvalues, then 



{Vr\J^l\Vs) 



0<a<T<At 



^-^Vr\Hrr^Vg)e-^■'^^-^\v,\Hn\v,)e-^'^dadT 

= ^ Drqs{Vr\Hm\Vq){Vq\Hn\Vs) 



with coefficients of 



Ate 



D 



rqs 



Ate 



- [1 - '^{uJqrAt)] Xq = Xg ^ Xr 

^^2g-A,,At Aq = A, = Xr 

(47) 

Assuming that we have already computed the exact 
gradient, then the eigendecomposition of iH[^{tk)\ and 
{vr\Hm\vs) are known for all H„n thus we can in principle 
evaluate the exact Hessian without too much additional 
computational effort, but the computational overhead of 
evaluating the Hessian and inverting it still needs to be 
carefully weighed against the potential gains. 



For small At we can approximate the double integral 

Ji^2 « \M''Ui{tk,Tk){H,n,Hn}Uf{TkM-i) (48) 



where Tk = tu - 5 At = tk-i + 5 At and {iJ„,i7„} = 
HmHn + HnHm is the anti-commutator, which yields 

^i^l « -^'RTT{W^Uf{T,Tk){H,n,H,,}UfiTk,0)) , 

and if W^Uf{T,0) w I then W^Uf{T,Tk) w ^7f(rfe,0)^ 
thus cycling products under the trace, we obtain 



(fe) 

mn 



^5RTr(ff,„iJ„). 



(49) 



So if the control Hamiltonians Hm are orthonormal with 
respect to the usual Hilbert-Schmidt inner product, i.e., 
Tr(i?mi?„) = then we expect the Hessian Sj^'^'' to 
approach a multiple of the identity — at least for 
sufficiently small At and fidelity sufficiently close to its 
maximum of 1. In this limit the Newton update reduces 
to the greedy gradient update with the search length a* 
based on a quadratic model of F{a) discussed in the pre- 
vious section, since the gradient and Newton directions 
then coincide. Thus, if the local Hessian is close to a 
scalar matrix then evaluation and inversion of the Hes- 
sian simply adds extra computational overhead over the 
greedy gradient update. Furthermore, considering that 
the greedy gradient update is suboptimal globally, the 
Newton method may actually achieve worse convergence 
per iteration in the long run than the modified gradi- 
ent update with overshoot, although the Newton method 
could be adapted to incorporate a scaling factor 7 to 
achieve a similar deliberate over- or undershoot effect. 

These observations are confirmed by Fig. |9j showing 
that the sequential Newton update does not perform 
well in the long run for problem 1, which clearly sat- 
isfies Tr^HmHn) = Smn- Closc inspection shows that 
the Newton update outperforms the gradient update for 
the first few iterations, consistent with Fig. [TT] which 
shows the relative error of the scalar Hessian approxi- 
mation lli^^*^-* — /3I||/||^^'^'' II where /3 here is the average 
of the diagonal elements of io^'^^ as the function of the 
total number of time steps executed. As expected, the 
scalar matrix approximation is a very poor fit initially 
but after approximately 3,000 time steps (10 iterations 
with K — 300 time steps) the error of the scalar approx- 
imation is approximately 5%. Despite the fact that our 
time steps At = 0.1 are not small (max|a;„„i|At ^ 1), 
and ( 48 1 is therefore not a very good approximation and 



the Hessian in the limit is not exactly diagonal, after a 
few iterations the diagonal elements are still sufficiently 
small to be negligible. This illustrates an important dif- 
ference between approximating the gradient and Hessian 
— for both of these, it is the relative error which deter- 



mines how accurate the step (44) will be, but contrary 



to the gradient, the Hessian does not tend to vanish at 
high fidelities, so that cruder approximations suffice to 
usefully estimate it. 

The condition Tr(iJ,„iJ„) cx 6mn, implying S^^''^ — )■ /3I 
for gate optimization problems, is clearly satisfied for 



14 



problems involving qubits or spin-^ particles with mul- 
tiple independent local controls such as X^"\ F^") or 

and can always be made to hold by an application 
of the Gram-Schmidt orthonormalisation process to the 
control matrices. This argument does not apply however, 
to pure-state transfer, tracking or observable optimiza- 
tion problems, for which the Hessian in the limit can be 
arbitrary. For instance, if we consider the simplest case, 

= 3?($|*f(T)), then the local Hessian for the kth 
time slice, assuming At not too large, is 

j^C^) « 5R($|C/f(T,rfe){i/™,iJ„}|*f(rfe)). (50) 

This expression will vanish for all and regardless of 
the initial and target state if Hm and Hn anti-commute, 
but in general it need not vanish even at the global 
maximum, and {iJm,i?„} = for all m,?i > is a 
much stronger condition than orthogonality, which is 
generally not even satisfied for spin systems with inde- 
pendent local controls on different qubits because e.g.. 

This suggests a dynamic choice of update rule depend- 
ing on the type of problem considered. For gate optimiza- 
tion problems it may be useful to do a few trust-region 
update steps initially, possibly switching to a simplified 
Newton update as the Hessian approaches a diagonal ma- 
trix, before finally switching to a gradient update with 
a search length of or determined as described in the 
previous section. For other optimization problems such 
as state transfer or observable optimization, trust-region 
update is likely to be advantageous, although the added 
computational cost of evaluating the Hessian must be 
taken into account. This cost can be amortized, how- 
ever, by exploiting the similarity between i^^*^-* at adja- 
cent fcs for a given field along a sweep, and the fact that 
each i^^'^^ individually converge as the field converges. 
Fig. |9] also suggests that sequential update algorithms 
initially lead to much larger gains in the fidelity than 
the most common concurrent update strategy based on 
a quasi-Newton algorithm with BFGS Hessian update 
|50j . However, the initial advantage of the sequential 
update diminshes as the optimization proceeds and the 
concurrent update overtakes the sequential varieties at 
high fidelities. The issue of comparison of sequential and 
concurrent update algorithms is explored in detail in |51j , 
which confirms this observation for a range of gate opti- 
mization problems, and suggests the development of hy- 
brid strategies. 



V. MONOTONICITY AND RATE OF 
CONVERGENCE 

If we allow the step size s to be arbitrarily small, or in 
the extreme case let it vanish so that we are considering 
the instantaneous or continuous method, we have seen 
that analysis of the sequential scheme reduces to that 
of 37^7^, because for all the fidelity functions we are 



considering, and even more generally, the fidelity varies 
in a linear fashion with respect to such local changes in 
the fields. Once we move to a fixed step size however, 
an accurate approximation to ^{{ + abglm) can require 
arbitrarily high order derivatives of 5^, and the most we 
can say in general regarding the s — > regime is that the 
number of derivatives required to achieve a given accu- 
racy for a scales as 1/s. This is a weak result, however, 
which does not reveal much more than we had already 
established about different choices of a. To get stronger 
results we need to distinguish at least two cases. The 
first is the unitary gate problem of or and the 
second the pure state problem of , which using the ad- 
joint representation encompasses ^2 and therefore also 
and S^ih as special cases of this latter. Contrary to the 
vanishing s situation the analysis for finite step size s is 
quite context sensitive and it is convenient to describe 
our results for the single control case before generalizing 
to several controls. 



A. Quadratic Structure 

Let us consider, at a given value of the single field 
/, how the fidelity '■= t?(/ -I- abs^p^) varies as the 
fk component of the field, corresponding to the basis 
function b^ p^^ , is changed by an amount a. For all fidelity 
functions, integrating up the lower bound on their second 
derivative gives the quadratic lower bound 

^\a>m + a^^ + qa^ (51) 

Ofk 

for q equal to some global constant qi,. This immedi- 
ately means any a between and — ^ must lead 
to an increase in ^, so that we have already identified 
a whole class of schemes monotonically increasing in ^. 
In the unitary cases, what is interesting is that we can 
also find an upper bound of this form, with q — qa, and 
such that qa — >■ as £, s — >■ 0. The actual fidelity along 
this local change in the field 5|a is therefore increasingly 
constrained as we approach the asymptotic instantaneous 
regime, so that an increase in ^ can only happen for a 
between and ^^{\\Hi\\l + 0{^f^ + s)) / . Over this 
a interval of interest, the difference between the bounds, 
therefore also the error incurred by the second order Tay- 
lor expansion of about 0, is <S.O{V^-\- s). More gen- 
erally, in the unitary multiple control cases, we have that 
the local Hessian entries 

= ^[Tr(iJ„F„) +0(n/€ + .)], (52) 

and the error in the second order expansion of 5(f) is still 
€0(\/€ + s) for any change in resulting in an increase 
of ^. 

This offers a classification of those a leading to mono- 
tonically increasing algorithms which coincides with the 
one of the previous section for fixed a and arbitrarily 
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FIG. 12: Static (red, above) and stateful (blue, below) bounds 
and static overestimate (yellow, in between) on the asymp- 
totic rate of convergence for step size s = 4f[|^^HoJl[- This is 
in the idealized case of Qa ~ <lb, but otherwise the left half of 
the figure would still be as shown with the proportional search 
length interpreted as aqt- The static bound is in fact infinite 
at I since we cannot strictly rule out attaining a global maxi- 
mum in a single step, but in practice it is standard to assume 
this can never happen, in which case the static overestimate 
gives an impression of what the bound would look like. 



small s, but is also applicable to the practically relevant 
context with both a and s fixed. It is also interesting 
to note that as 5'|q is linear for arbitrarily small s, it is 
natural to add to it a purely quadratic cost term £ to 
ensure Z = -S + ^ has a unique global maximum with 
respect to changes in / by 6s — but for s fixed, such an 
alteration of the objective is inappropriate as it interferes 
with the already quadratic nature of In practice we 
also find that its second order Taylor expansion is a very 
good approximation to even when s and especially £ 
are quite some distance away from the limiting value of 0, 
and it therefore does not seem worth considering higher 
order expansions. Going beyond second order would also 
be quite problematic for more than one control as we 
would not have a reliable way of finding even local max- 
ima based on this information. 



B. Asymptotic Rate of Convergence 

One guiding principle of numerical optimization is to 
be greedy: we wish to update the variables whenever 
enough information can be obtained to do so intelligently 
and aim to induce the largest possible increase in the ob- 
jective. This would point towards back-and-forth sweep- 
ing, going to the maximum in on each step, as the 
most efficient strategy as forward-only sweeping wastes 
the opportunity to increase the fidelity when propagating 
the backwards ODE. In the single field and unitary cases 
of the previous subsection the second order Taylor expan- 
sion generally provides a good estimate for the location 



of the maximum of closest to 0, and using this a to 
update ffc gives a canonical optimization algorithm from 
the set of all possible strategies leading to a monotonic 
increase in the fidelity. 



Unfortunately, as we have seen, this is not the best 
strategy as it is susceptible to rather extreme slowdown 
in the long run. The problem is that the fidelity is ef- 
fectively quadratic and therefore going for the nearest 
maximum of fidelity is equivalent to making the local 
gradient as close to zero as possible. Moreover, as the 
Hessian converges to a fixed value in the limit the in- 
crease in fidelity achievable in this way is proportional to 



dfk 



the gradient norm squared. Since the gradient 

the continuous function jj^^ integrated against bs^p^. , it 
cannot change much as we step to the next basis function 
bs,pk+i centered at an adjacent point Pk+i, as is always 
the case for back-and-forth sweeping. Therefore taking 
the largest gains available on the current step, as with the 
canonical greedy algorithm, precludes large gains being 
made on subsequent steps. It is not immediately obvi- 
ous, however, what the effect of this will be overall. To 
answer this question, we derive bounds on the rate of 
convergence, in particular the asymptotic rate, as we are 
already in the regime of small infidelity (£ throughout. 
This stateful bound on the asymptotic rate of conver- 



gence is compared to the static bound in Fig. 12 for an 
illustrative choice of the step size s and every valid search 
length a. The combined bound reproduces the bimodal 
profile of asymptotic rate vs search length observed in 
Fig. [t] (right) — the rate must vanish towards the ends 
of the interval of search lengths making fidelity increase, 
but it must also be small for greedy search lengths in the 
middle of the interval. 



In the remaining pure state, density matrix and ob- 
servable cases, the situation is less decisive; in particu- 
lar, the local Hessian need not converge to any prede- 
termined value as £ and s vanish. Naturally the fidelity 
with respect to local change in some ffe can only be ac- 
curately approximated up to the nearest local maximum 
by its second order Taylor expansion when the Hessian 
is not too small as otherwise this quadratic model does 
not even have a clear maximum. In the asymptotic in- 
stantaneous regime of small £ and s, however, we do have 
that when the Hessian is small, the local gradient must be 
too, implying that substantial increases in fidelity require 
large changes be made to fk ■ Our lack of certainty in the 
asymptotic local Hessian values precludes rigorously ex- 
tending the clean picture from Fig.[T2]to these cases, but 
the intuition behind it is equivalent. We must choose 
at each step between maximizing the immediate fidelity 
gain and restricting future gains by having a small gradi- 
ent, and possibly introducing undesirably large peaks in 
the field by making large changes to the field amplitude. 
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FIG. 13: Combined plot of the infidelity and instantaneous 
rate across several sweeps comparing their actual and mod- 
elled values. The rate model is obtained by a two parameter 
least-squares fit of ( 53 1 to the rate data, which itself comes 



from forward differences of the infidelity logarithm. The infi- 
delity model is based on the same r* and ro, combined with 
a fitted value for the additional scale factor involved. The 
plotted run is the one with median sum of squares error in 
the infidelity model across a set of 100 runs. 



C. General Rate Behavior 

In the asymptotic 2 — >■ regime the behavior of dis- 
cretized sequential optimization methods is determined 
by their linearization as discrete dynamical systems, in- 
dependently of the specific objective or fidelity, number 
of controls, or type of sweep. This viewpoint motivates 
the following ansatz to describe the (instantaneous) rate 
of convergence at high fidelities: 



1 



ro 



gn(ro-r') „ I 



(53) 



where n is the iteration number, r* the asymptotic rate 
and (r* -I- ro)/2 the initial n = rate. Upon integra- 
tion this provides a model for log(2;) up to a scale factor, 
and hence a model for ^ with only 3 parameters. While 
this model is motivated by analysis of the asymptotic 
regime it appears to approximate both the rate and fi- 
delity remarkably accurately down to the first iteration 
as illustrated in Fig. [13] 



VI. CONCLUSIONS 

In the context of quantum control, the method due 
to Krotov is usually presented in a continuous formula- 
tion and only discretized a posteriori for numerical use. 
However, we have seen how even the fundamental mono- 
tonicity property is jeopardized if the discretization scale 
At is not taken into account when choosing the update 
rule. It is thus natural to view the discretized method, 
which is a sequentially updating optimization algorithm. 



as more fundamental and consider its continuous ana- 
logue as an instantaneous limit of this. The fidelity with 
respect to local changes in the field made in each step 
of such a sequential algorithm is essentially a quadratic 
function, which becomes linear in the instantaneous limit. 
Addition of a penalty in the continuous Krotov method 
can therefore be seen as a way of making the objective 
function quadratic, from which a canonical update for- 
mula emerges. Such a penalty is unnecessary for the dis- 
cretized method, however, and in fact undesirable from 
both a theoretical and practical point of view. 

At this stage a large class of monotonic update schemes 
with different sets of search lengths are available and in 
the literature the choice is typically left to the user. How- 
ever, the search length is an important parameter that 
strongly influences the performance characteristics of the 
algorithm and guidance in selecting it is thus critical. 
The instrumental notion in doing so is the asymptotic 
rate of convergence, which had previously been shown 
[5^ to be qualitatively at least linear, but for which no 
quantitative estimates were available. At first glance the 
natural choice of update scheme is the greedy one, maxi- 
mizing immediate fidelity gains, but the asymptotic rate 
of convergence analysis shows that we cannot extrapolate 
from the initial rate of convergence, and proves this to be 
a poor strategy in the long run. Fortunately, the reason 
behind this failing also emerges from the analysis and 
we are able to offer modifications to the greedy search 
length or back-and-forth sweeping that enhance perfor- 
mance. The analysis explains why forward-only sweep- 
ing appears to be the preferred strategy in the literature, 
and suggests further improvements such as a split sweep 
that logically extend the advantage of forward-only over 
back-and-forth sweeping. 

In discretizing the continuous method it is also com- 
mon to use YTn-^At, where is the derivative in the 

f>J(tk) ' afk 

instantaneous limit. However, this is only an approxi- 
mation that is liable to break down and corrupt the al- 
gorithm, especially towards low infidelities £ or larger 
time steps At. We address this issue by outlining the 
exact method and various series expansions appropriate 
for computing these gradients for the most common 
choice of piecewise constant controls for each of the fi- 
delity functionals under consideration. In selecting an 
update direction and search length one is naturally lead 
to use second derivative (Hessian) information, for which 
an exact formula is also available. In contrast to the 
situation for concurrent update optimization algorithms, 
the analysis also shows however that using the full lo- 
cal Hessian generally does not result in a performance 
improvement compared to a dynamic search length ad- 
justment based on a quadratic model, at least for unitary 
gate optimization problems. 

Looking forward, the general formulation of sequential 
methods applied to a wide range of control optimization 
problems that arise in the context of quantum control, 
as well as simplified convergence results should enable a 
streamlined application of these methods to optimizing 
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other fidehty functions. The fact that any parametriza- 
tion of the fields in terms of localized functions can be 
used for the sequential optimization opens the way for 
more problem-adapted choices than the standard top-hat 
function. Finally, the study of the convergence rate initi- 
ated herein should enable further development of heuris- 
tics to make more efficient choices of search lengths and 
sweeping patterns, and the development of hybrid meth- 
ods that take advantage of the rapid improvements at- 
tainable by sequential update in the initial phase of the 
optimization to find suitable candidate fields with moder- 
ately high fidelities before switching to alternative strate- 
gies such as concurrent update to avoid the convergence 
slowdown of sequential update methods in the asymp- 
totic regime. Taking together such improvements in the 
efficiency of the algorithms employed for dynamic con- 
trol optimization should facilitate the application of the 
method to a wide range of coherent control problems and 
more realistic systems with larger Hilbert space dimen- 
sions or systems involving many qubits. 
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Appendix A: Trust-Region Methods 

The trust region sub-problem (TRSP) consists in find- 
ing a value of x with < r such that }^x^ Ax -\- x 
attains its minimum possible value over this ball of radius 
r. Since A can always be taken symmetric, in a suitable 
eigenbasis, it can be expressed as a diagonal matrix with 
diagonal elements Ai < A2 . . . < A„. We implicitly work 
in this basis in what follows, so in particular let (71, ... , gn 
be the components of the vector g expressed in this ba- 
sis. The key to solving this problem is finding the scalar 
/i* < min{Ai, 0} such that x^-* defined by 

x^ = -{A-^lI)-^g (Al) 

has norm as close to r as possible. If there exists /i* 
such that = r then x^* is the unique solution of 

the TRSP corresponding to a minimum at the bound- 
ary of the trust region. If ||a;^*|| < r and /i* = < Ai 
then the unique solution of the TRSP is x^^ = —A^^g, 
corresponding to a minimum in the interior of the trust 
region. If ||a;^. || < r but < Ai < then ei (the eigen- 
vector corresponding to Ai) is a direction of decrease and 
we can change the first component of 2:^. (in the chosen 
eigenbasis) to reach norm r, and this point is a (not nec- 
essarily unique) solution to the TRSP corresponding to 
a minimum at the boundary. 

To find fi* , one can find the roots of ^ -I- (p{n), where 



As iy9 is a convex increasing function for fi < min{Ai, 0} 

2 

and its derivative (f'ifJ.) = ~v(/^)'^Sfe=i (Afc^V^)^ read- 
ily available, an efficient strategy is to use Newton root 
finding starting from /io = min{Ai,0}. If ^3 (/io) !i ~~, 
then fiQ = fJ-* ■ When fiQ = Ai and 51 7^ care must be 
taken to use 95' (/Xg) = 

Appendix B: Convergence Results for Discretized 
Problem 

As in the continuous case, convergence of the sequence 
{^("■)} (as a sequence of real numbers) is easy to estab- 
lish provided (i) the objective functional ^{i) is bounded 
above if we are maximizing, or below if we are mini- 
mizing, and (ii) the update scheme ensures monotonicity 
gr(n+i) —g-l") > 0, as a monotonically increasing (decreas- 
ing) sequence of real numbers that is uniformly bounded 
above (below) is convergent. However, ideally we would 
like to know that we actually converge to a field f* that is 
a critical point, and even better a global maximum (min- 
imum) of the objective ^. The latter convergence prop- 
erty is not a trivial matter since optimizing parameters 
sequentially can lead to iterates spiraling into a closed 
path without the gradient of the function converging to 
zero [55] . 

Sequential update schemes amount to iteratively up- 
dating a set of coordinates ei,...,e^ in a certain 
order. Specifically, we have K = K and = 
ek for forward-only update , K = 2{K — 1) and 
ei, . . . , ck, . . . , 62 for back-and- forth sweeping, and K = 
K and ei, . . . , eLK/2j ,6^, . . . , eLK/2j+i for split update. 
In general e„, equal to Ck for k = n mod if, is the di- 
rection of the change between x" and x"~^. In what 
follows, the full derivative is written d, individual coordi- 
nates are referenced through subscripting, and we make 
use of the shorthand 9„ for the partial derivative in co- 
ordinate Cn-i-l- 

Theorem 1 // we are sequentially maximizing an ana- 
lytic function F : M.^ — >■ M with uniformly bounded sec- 
ond derivatives 

d^F 

dxidxj ~ ^' ^ ^ 

then we can obtain F(a;"+^) — i^(x") > ^|9„F(a;")p for 
fi = on each iteration. If this inequality holds for some 
fi and the lengths of our searches are forced to satisfy 

\K^'-il\<j\d^F(x-)\ (B2) 

for some constant 7 then the sequence of iterates 
either diverges to infinity or converges to a stationary 
point. 

Proof To verify the first claim, suppose we vary the i"^ 
coordinate. — /3 < ^-5- over this line implies that F is 
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strictly increasing between x and x' = x + jj§^ei and 



dF 



dxi 



the value of F at x' is at least F{x) + ^ 

Next notice that updating along any coordinate can 
only alter the derivative ^ by at most /3|iJ^+^ — < 
Pl\dnF{x'^)\, so over several iterations we have 

k-l 



dF 
dxi 



< 



dF 

dxi 



J2Pl\dn+,F{x-+^) 



for any non- negative k. Letting fc^ + 1 be the smallest 
index within 1, . . . ,K such that in+ki+i — ^i, therefore 

K fei-1 

||dF(a;")|li < J2 |a„+fe.F(x"+'=-)| + E l3l\dn+,F{x-+r^\ 

i=l j=0 
K-l 

<aJ2\d,,+,F{x-+^)\ 

3=0 

with a = 1 + {K - l)l3j. Hence 

F(a;"+^') - F{x'') 
k-l 

>J2f,\d^+,F{x-+^)\' 

3=0 



'K-l 



* 3=0 



K-l 



>^iidF(x")ii,Ei^:::r^ 

>J^||dF(2;")||i|lx"+^^-:r'li 

A (77 



But this is exactly the primary descent condition of |54) . 
and by the main result in this paper, the sequence a;" 
either diverges, i.e., ||x"|| — > oo, or converges to some 
point X* . In any case, if remains bounded we have 



n=0 



\dnF{x")\'^ <-( lim F{x") - F{x°)) < oo 



implying dnF{x^) — )■ and by the earlier bound, 
||d-F(a:")|| — > as n — > oo; in particular when a;* exists, 
dF{x*)^0. □ 



The same argument holds for objectives with or with- 
out a penalty term. However, if we add a standard 
weighted norm squared penalty term £, this norm of the 
controls is guaranteed to be uniformly bounded, so there 
is no way the controls can fail to converge at all. 
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